State of Research

Past literature in democratization, nationalism, and autocratic regime maintenance illustrate how education invites both risk and reward for non-democratic states. Education is simultaneously linked with pro-democratic attitudes, political dis-engagement, and autocratic failure. At the same time, autocrats are predicted to be hesitant towards investing in disenfranchised populations. However, education has also been found to bolster national loyalty, human capital, and long-term development. Nor is the real-world variation clear, autocratic states display significant variation in educational investment and attainment in addition to varied relationships between education and political participation. I argue that education can sustain or compromise autocratic stability depending on two factors: the ethnic composition of the state and the extent to which the state uses propaganda in schools. Education does not have a uniform effect. Education will not instill similarly pro-democratic attitudes across a diverse population - even if the education “treatment” is constant. At the state level, similar educational policies and initiatives across autocratic states can have opposite outcomes - jeopardizing or strengthening autocratic stability. Similarly, at the individual level, increased education can lead to individuals becoming supportive of or opposed to the autocratic state. My dissertation seeks to investigate the factors leading to differences in this real world variation.

My research currently focuses on three inter-related questions:

  1. When does education strengthen or weaken national identity?
  2. When does education lead to autocratic stability or democratization?
  3. Do inclusive educational policies that recognize previously marginalized cultures/languages foster a shared or divisive national identity?

The following dataset creation is aimed at Question #1 and #2.

When does education lead to democratization and autocratic failure? Extensive literature highlights the relationship between education and democratization (Lipset, 1959; Lerner, 1958; Benavot, 1996; Glaeser et al., 2007; Sanborn and Thyne, 2014). My dissertation argues that investigating ethnic politics changes our understanding of this relationship. My theory highlights the risks and rewards of increasing educational attainment in autocratic states, conditional upon pre-existing ethno-political power relations. My primary causal mechanism suggests that increasingly educated and marginalized groups will foster stronger pro-democratic attitudes than similarly educated and advantaged groups, thereby education is more likely to lead to democratization when extended to marginalized groups.

The Problem

However, all previous literature on education and democratization focuses on the state level and no dataset exists that allows for testing sub-national dynamics. In other words, no information is currently readily available that accounts for ethnic groups’ educational attainment across time/space.

To test sub-national variation, I provide a preliminary and novel dataset on ethnic group educational attainment over time. Ideally, this will ultimately allow for investigating how sub-national variation in educational attainment impacts likelihood of democratization.

The Solution: Ethnic Group Education Dataset Overview

The following is an rmark document that highlights the creation of the Ethnic Group Education dataset (EGE). The EGE is a dataset that will include all major ethnic groups per country-year (1969-2015) and their educational attainment. No such dataset/information readily exists. As a preliminary first cut at such a dataset and proof of concept, I first construct the EGE for 35 countries in Africa.

In short, the dataset is constructed by:

  1. Taking Afrobarometer Survey waves 4-6 and merging them into one dataset.
  2. Use the Linking Ethnic Data in Africa Dataset Package in R to link respondents’ languages to ethnic groups across Afrobarometer waves. Specificaly I link individuals to ethnic groups as listed in the Ethnic Power Relations dataset, which includes country-year information on ethnic groups and their relative political status (Monopoly, Dominant, Senior Partner, Junior Partner, Powerless, Discrimianted, Irrelelvant).
  3. Backtracking educational attainment means per ethnic group per year.

The end result is a dataset that contains country-year information on every major ethnic group in 35 African countries and their corresponding:

  1. Average educational attainment from 1969-2015
  2. The corresponding ethnic group information (i.e. is the ethnic group excluded from power, dominant, etc.)

Code & Creation of EGE

The following highlights the construction of the dataset, and then provides some preliminary figures/information using the dataset.

Merging all Afrobarometer Datasets

Afrobarometer Round 4

First, Afrobarometer numbers their countries differently each round. Therefore, I want to get their country names in the data so I can use their COW Country Codes. For each round, I have a corresponding excel document that lists the country name and Afrobarometer value. I then can use the Country Code package to standardize the values.

We’ll keep the following information from the Afrobarometer Wave 4:

  • age (Q1)
  • education (Q89)
  • language (Q3)
  • country (COUNTRY)
  • survey year (DATEINTR)
  • ethnic or national identity (Q83)
  • male
  • employment
  • urban/rural
  • views on democracy (general)
  • extent of democracy in [Respondent Country]
  • satisfcation w/ democracy in [Respondent country]
## Age
# Question Number: Q1
# Question: How old are you?
#    Variable Label: Q1. Age
# Values: 18-110, 998-999, -1
# Value Labels: 998=Refused to answer, 999=Don't know, -1=Missing 

ab4$age <- ab4$Q1
ab4$age <- as.numeric(as.character(ab4$age))
ab4$age[ab4$age == -1] <- NA
ab4$age[ab4$age == 998] <- NA
ab4$age[ab4$age == 999] <- NA
#table(ab4$age)


## Education
# Question Number: Q89
# Question: What is the highest level of education you have completed?
# Variable Label: Education of respondent
# Values: 0-9, 99, 998 -1
# Value Labels: 0=No formal schooling, 1=Informal schooling only (including Koranic schooling), 2=Some primary schooling, 3=Primary school completed, 4=Some secondary school/ high school, 5=Secondary school completed/high school completed, 6=Post-secondary qualifications, other than university e.g. a diploma or degree from polytechnic or college, 7=Some university, 8=University completed, 9=Post-graduate, 99=Don’t know, 998=Refused to answer, -1=Missing data

ab4$edu <- as.numeric(ab4$Q89)
ab4$edu[ab4$edu == -1] <- NA
ab4$edu[ab4$edu == 99] <- NA
#table(ab4$edu)

ab4$primary <- ifelse(ab4$edu >= 3, 1, 0)
ab4$secondary <- ifelse(ab4$edu >= 5, 1, 0)
ab4$tertiary <- ifelse(ab4$edu >= 8, 1, 0)

## Language
# Question Number: Q3
# Question: Which [country] language is your home language?
# Variable Label: Language of Respondent
# Values: See codebook
# Value Labebls: See codebook

ab4$language <- ab4$Q3

## Survey Year
# Question: Date of interview
# Variable Label: Date of interview
# Values: 04.03.08 – 31.12.08

# table(ab4$DATEINTR)
# Despite the codebook saying the values are only in 2008, the table indicates that some respondents were interviewed into 2009.
# Therefore, I'll create a new variable that takes the first 4 digits/integers of the DATEINTR variable.

ab4$year <- ab4$DATEINTR
ab4$year <- as.character(ab4$year)
ab4$year <- str_sub(ab4$year, 1, 4)
ab4$year <- as.numeric(ab4$year)
table(ab4$year)
## 
##  2008  2009 
## 25305  2408
## Ethnic vs. National Identity
# Question Number: Q83
# Question: Let us suppose that you had to choose between being a [Ghanaian/Kenyan/etc.] and being a ________ [R’s Ethnic Group]. Which of the following best expresses your feelings?
# Variable Label: Ethnic or national identity
# Values: 1-5, 7, 9, 998, -1
# Value Labels: 1=I feel only (R’s ethnic group), 2=I feel more (R’s ethnic group) than [Ghanaian/Kenyan/etc.], 3=I feel equally [Ghanaian/Kenyan/etc.] and (R’s ethnic group), 4=I feel more [Ghanaian/Kenyan/etc.] than (R’s ethnic group), 5=I feel only [Ghanaian/Kenyan/etc.], 7=Not applicable, 9=Don’t know, 998=Refused to answer, - 1=Missing data

ab4$identity <- ab4$Q83
ab4$identity[ab4$identity == -1] <- NA
ab4$identity[ab4$identity == 7] <- NA
ab4$identity[ab4$identity == 9] <- NA

# Urban vs. Rural
## Question Number: URBRUR
## Question: PSU/EA
## Variable Label: Urban or Rural Primary Sampling Unit Values: 1-2
## Value Labels: 1=urban, 2=rural
## Note: Answered by interviewer

ab4$rural <- ab4$URBRUR
ab4$rural <- ab4$rural - 1
#1: rural, 0: urban

# Sex
# Question Number: THISINT
# Question: This interview must be with a: Variable Label: This interview, gender Values: 1, 2
# Value Labels: 1=Male, 2=Female
# Note: Answered by interviewer


ab4$female <- ab4$THISINT
ab4$female <- ab4$female-1
#1: female, 0 : male

# Employed
# Question Number: Q94
# Question: Do you have a job that pays a cash income? Is it full-time or part-time? And are you presently looking for a job (even if you are presently working)?
# Variable Label: Employment status
# Values: 0-5, 9, 998, -1
# Value Labels: 0=No (not looking), 1=No (looking), 2=Yes, part time (not looking), 3=Yes, part time (looking), 4=Yes, full time (not looking), 5=Yes, full time (looking), 9=Don’t know, 998=Refused to answer, -1=Missing data Source: SAB

ab4$employment <- ab4$Q94
ab4$employment[ab4$employment == -1] <- NA
ab4$employment[ab4$employment == 9] <- NA

ab4$employed <- ifelse(ab4$employment > 1, 1, 0)


# View on Democracy
# Question Number: Q30
# Question: Which of these three statements is closest to your own opinion?
# Statement 1: Democracy is preferable to any other kind of government.
# Statement 2: In some circumstances, a non-democratic government can be preferable.
# Statement 3: For someone like me, it doesn’t matter what kind of government we have.
# Variable Label: Support for democracy
# Values: 1-3, 9, 998, -1
# Value Labels: 1=Statement 3: Doesn’t matter, 2=Statement 2: Sometimes non-democratic preferable, 3=Statement 1: Democracy preferable, 9=Don’t know, 998=Refused to answer, -1=Missing data

#table(ab4$Q30)
ab4$democracy <- ab4$Q30
ab4$democracy[ab4$democracy == -1] <- NA
ab4$democracy[ab4$democracy == 9] <- NA
#table(ab4$democracy)

# Extent of Democracy in [Country]
# Question Number: Q42A
# Question: In your opinion how much of a democracy is [Ghana/Kenya/etc.]? today?
# Variable Label: Extent of democracy
# Values: 1-4, 8, 9, 998, -1
# Value Labels: 1=Not a democracy, 2=A democracy, with major problems, 3=A democracy, but with minor problems, 4=A full democracy, 8=Do not understand question/ do not understand what ‘democracy’ is, 9=Don’t know, 998=Refused to answer, -1=Missing data
# Source: Ghana 97

# table(ab4$Q42A)
ab4$democracyInCountry <- ab4$Q42A
ab4$democracyInCountry[ab4$democracyInCountry == -1] <- NA
ab4$democracyInCountry[ab4$democracyInCountry == 8] <- NA
ab4$democracyInCountry[ab4$democracyInCountry == 9] <- NA
# table(ab4$democracyInCountry)

# Satisfied w/ Democracy in [Country]
# Question Number: Q43
# Question: Overall, how satisfied are you with the way democracy works in [Ghana/Kenya/etc.]? Are you: Variable Label: Satisfaction with democracy
# Values: 0-4, 9, 998, -1
# Value Labels: 0=My country is not a democracy, 1=Not at all satisfied, 2=Not very satisfied, 3=Fairly satisfied, 4=Very satisfied, 9=Don’t know, 998=Refused to answer, -1=Missing data
# Source: Eurobarometer

#table(ab4$Q43)
ab4$satisfiedDemInCountry <- ab4$Q43
ab4$satisfiedDemInCountry[ab4$satisfiedDemInCountry == -1] <- NA
ab4$satisfiedDemInCountry[ab4$satisfiedDemInCountry == 9] <- NA
#table(ab4$satisfiedDemInCountry)



# Trust in President
# Question Number: Q49A
# Question: How much do you trust each of the following, or haven’t you heard enough about them to say: The President?
# Variable Label: Trust president
# Values: 0-3, 9, 998, -1
# Value Labels: 0=Not at all, 1=Just a little, 2=Somewhat, 3=A lot, 9=Don’t know/Haven’t heard enough, 998=Refused to answer, -1=Missing data
# Source: Zambia96
# Note: “Prime Minister” in Lesotho; “President” and “Prime Minister” in Burkina Faso, Cape Verde, Madagascar, Mali, Mozambique, Namibia, Senegal and Zimbabwe; “President” in Benin, Botswana, Ghana, Kenya, Liberia, Malawi, Nigeria, South Africa, Tanzania, Uganda, and Zambia.

ab4$trustPresident <- ab4$Q49A
ab4$trustPresident[ab4$trustPresident == -1] <- NA
ab4$trustPresident[ab4$trustPresident == 9] <- NA
# table(ab4$trustPresident)


# Trust in Parliament
# Question Number: Q49B
# Question: How much do you trust each of the following, or haven’t you heard enough about them to say: Parliament?
# Variable Label: Trust parliament/national assembly
# Values: 0-3, 9, 998, -1
# Value Labels: 0=Not at all, 1=Just a little, 2=Somewhat, 3=A lot, 9=Don’t know/Haven’t heard enough, 998=Refused to answer, -1=Missing data
# Source: Adapted from Zambia96
# Note: “National Assembly” in Benin, Burkina Faso, Cape Verde, Liberia, Madagascar, Malawi, Mali, Mozambique, Nigeria, Tanzania, Uganda, Zambia; “Parliament” in Botswana, Ghana, Kenya, Lesotho, Namibia, Senegal, ans South Africa; “House of Assembly” in Zimbabwe.

#table(ab4$Q49B)
ab4$trustParliament <- ab4$Q49B
ab4$trustParliament[ab4$trustParliament == -1] <- NA
ab4$trustParliament[ab4$trustParliament == 9] <- NA
#table(ab4$trustParliament)



# Trust in Ruling Party
# Question Number: Q49E
# Question: How much do you trust each of the following, or haven’t you heard enough about them to say: The Ruling Party?
# Variable Label: Trust the ruling party
# Values: 0-3, 9, 998, -1
# Value Labels: 0=Not at all, 1=Just a little, 2=Somewhat, 3=A lot, 9=Don’t know/Haven’t heard enough, 998=Refused to answer, -1=Missing data
# Source: Adapted from Zambia96


#table(ab4$Q49E)
ab4$trustRP <- ab4$Q49E
ab4$trustRP[ab4$trustRP == -1] <- NA
ab4$trustRP[ab4$trustRP == 9] <- NA
#table(ab4$trustRP)

# Trust Traditional Leaders
# Question Number: Q49I
# Question: How much do you trust each of the following, or haven’t you heard enough about them to say: Traditional leaders
# Variable Label: Trust traditional leaders
# Values: 0-3, 9, 998, -1
# Value Labels: 0=Not at all, 1=Just a little, 2=Somewhat, 3=A lot, 9=Don’t know/Haven’t heard enough, 998=Refused to answer, -1=Missing data
# Source: Zambia 96

#table(ab4$Q49I)
#ab4$trustTL <- ab4$Q49I
#ab4$trustTL[ab4$trustTL == -1] <- NA
#ab4$trustTL[ab4$trustTL == 7] <- NA
#ab4$trustTL[ab4$trustTL == 9] <- NA
#table(ab4$trustTL)



# Ethnic Group Treated Unfairly
# Question Number: Q82
# Question: How often are ___________s [R’s Ethnic Group] treated unfairly by the government?
# Variable Label: Ethnic group treated unfairly
# Values: 0-3, 7, 9, 998, -1
# Value Labels: 0=Never, 1=Sometimes, 2=Often, 3=Always, 7=Not applicable, 9=Don’t know, 998=Refused to answer, -1=Missing data
# Source: SAB
# Note: Interviewer probed for strength of opinion. If respondent did not identify any group on this question – that is, if they “Refused to answer” (998), said “Don’t know” (999), or “Ghanaian only” (990) – then the interviewer marked “Not applicable” for questions 80-83 and continued to question 84.

#table(ab4$Q82)
ab4$treatedUnfairly <- ab4$Q82
ab4$treatedUnfairly[ab4$treatedUnfairly == -1] <- NA
ab4$treatedUnfairly[ab4$treatedUnfairly == 7] <- NA
ab4$treatedUnfairly[ab4$treatedUnfairly == 9] <- NA
#table(ab4$treatedUnfairly)
##   COUNTRY Statename ccode  RESPNO age edu primary secondary tertiary language
## 1       1     Benin   434 BEN0001  38   4       1         0        0      100
## 2       1     Benin   434 BEN0002  46   2       0         0        0      104
## 3       1     Benin   434 BEN0003  28   4       1         0        0      101
## 4       1     Benin   434 BEN0004  30   3       1         0        0      100
## 5       1     Benin   434 BEN0005  23   4       1         0        0      100
## 6       1     Benin   434 BEN0006  24   4       1         0        0      100
##   year identity rural female employment employed democracy democracyInCountry
## 1 2008        2     0      1          0        0         3                  4
## 2 2008        4     0      0          1        0         3                  4
## 3 2008       NA     0      1          2        1         3                  4
## 4 2008        5     0      0          1        0         2                  3
## 5 2008        5     0      1          1        0         3                  2
## 6 2008        5     0      0          1        0         2                  3
##   satisfiedDemInCountry trustPresident trustParliament trustRP treatedUnfairly
## 1                     4              3               1       1               0
## 2                     4              1               1       0               0
## 3                     2              3               2       2              NA
## 4                     2              1               1       0               0
## 5                     2              3               3       1               0
## 6                     3              1               2       2               1

Afrobarometer Round 5 & 6

Now I want to do the same for Round 5 and Round 6 of Afrobarometer. I do not replicate the code below, but it is otherwise identical in execution to the code for Round 4.

##   COUNTRY Statename ccode  RESPNO age edu primary secondary tertiary language
## 1       1   Algeria   615 ALG0001  48   5       1         1        0        5
## 2       1   Algeria   615 ALG0002  36   5       1         1        0        2
## 3       1   Algeria   615 ALG0003  34   5       1         1        0        5
## 4       1   Algeria   615 ALG0004  23   6       1         1        0        5
## 5       1   Algeria   615 ALG0005  41   2       0         0        0        5
## 6       1   Algeria   615 ALG0006  38   4       1         0        0        5
##   year identity rural female employment employed democracy democracyInCountry
## 1 2013       NA     1      0          3        1         2                  1
## 2 2013       NA     1      1          3        1         3                  1
## 3 2013       NA     1      0          3        1         3                  2
## 4 2013       NA     1      1          3        1         2                  3
## 5 2013       NA     1      0          3        1         3                  3
## 6 2013       NA     1      1          0        0         3                 NA
##   satisfiedDemInCountry trustPresident trustParliament trustRP treatedUnfairly
## 1                     1              3               1       0              NA
## 2                     2              2               1       1              NA
## 3                     3              2               2       1              NA
## 4                    NA              2               1       0              NA
## 5                     3              2               0       0              NA
## 6                    NA              2              NA      NA              NA
##   COUNTRY Statename ccode  RESPNO age edu primary secondary tertiary language
## 1       1   Algeria   615 ALG0001  27   2       0         0        0     1420
## 2       1   Algeria   615 ALG0002  30   2       0         0        0     1420
## 3       1   Algeria   615 ALG0003  62   8       1         1        1     1420
## 4       1   Algeria   615 ALG0004  30   5       1         1        0     1420
## 5       1   Algeria   615 ALG0005  35   4       1         0        0     1420
## 6       1   Algeria   615 ALG0006  21   7       1         1        0     1420
##   year identity rural female employment employed democracy democracyInCountry
## 1 2015       NA     1      0          3        1         1                 NA
## 2 2015        2     1      1          0        0         3                  1
## 3 2015       NA     1      0          0        0         3                  2
## 4 2015       NA     1      1          3        1         3                  3
## 5 2015       NA     1      0          3        1         2                  2
## 6 2015       NA     1      1          1        0         3                  2
##   satisfiedDemInCountry trustPresident trustParliament trustRP treatedUnfairly
## 1                    NA              3               2       2              NA
## 2                     3              1               1       1               0
## 3                     3              1               1       1               0
## 4                     3              2               1       2              NA
## 5                     3              2               0       1              NA
## 6                     3              2               1       1              NA

Merging Round 4-6

Each of the Afrobarometer (rounds 4-6) are now identical in that they each have the following variables:

  • COUNTRY
  • Statename
  • ccode
  • RESPNO
  • age
  • edu (+ primary, secondary, tertiary)
  • rural
  • female
  • employment (status)
  • employed (binary)
  • democracy
  • democracy in country
  • trust in president
  • trust in parliament
  • trust in ruling party
  • whether ethnic group is treated unfairly
  • language
  • year
  • identity

Given they have the same order of the columns as well, I could rbind() them; however, the ``RESPNO’’ variables will then be duplicated. Therefore, the first thing I do is add the year to each of the RESPNO.

## [1] "BEN0001-2008" "BEN0002-2008" "BEN0003-2008" "BEN0004-2008" "BEN0005-2008"
## [6] "BEN0006-2008"

Linking Ethnic Data with EPR

Now, I want to use the Linking Ethnic Data in Africa Dataset package to use the language of each respondent as an indicator of Ethnicity, which I can then link to other datasets (such as EPR). The Ethnic Power Relations dataset includes country-year information on ethnic groups and their relative political status (Monopoly, Dominant, Senior Partner, Junior Partner, Powerless, Discrimianted, Irrelelvant).

LEDA() lets me produce a dataset that includes the language name from Afrobarometer and it’s corresponding ethnic group from EPR. Here’s an example.

##     a.group                              b.group        a.type b.type
## 39     Adja                  Southwestern (Adja) Afrobarometer    EPR
## 97     Adja                  Southwestern (Adja) Afrobarometer    EPR
## 166    Adja Southeastern (Yoruba/Nagot and Goun) Afrobarometer    EPR
## 213    Adja                  Southwestern (Adja) Afrobarometer    EPR
## 282    Adja Southeastern (Yoruba/Nagot and Goun) Afrobarometer    EPR
## 329    Adja                  Southwestern (Adja) Afrobarometer    EPR

Next, I need to load in the corresponding ``Language ID Number’’ and ‘’Language Name’’ from Afrobarometer. The following excel documents were created using the Codebooks from Afrobaromter. In short, I would copy and paste the delimmeted list from the codebooks of the ID=language and use excel to automatically make them into individual rows/columns.

Let’s run a few checks to see how much the language IDs from the coodebooks and LEDA() match.

##  [1] "Fuls"                                    
##  [2] "Moore"                                   
##  [3] "Senoufo"                                 
##  [4] "Arabe"                                   
##  [5] "Khassonke"                               
##  [6] "Malinke"                                 
##  [7] "Soninke/ Sarakoll"                       
##  [8] "Sonrhai"                                 
##  [9] "Mang'anja"                               
## [10] "Oshiwambo"                               
## [11] "Ijaw/Kalabari/Okirika/Andoni/Ogoni/Nembe"
##  [1] "Moore"                               "Senoufo"                            
##  [3] "Baoule"                              "Bete"                               
##  [5] "Godie"                               "Guere"                              
##  [7] "Diakanke"                            "konianke"                           
##  [9] "Maasai / Samburu"                    "Meru / Embu"                        
## [11] "\"Official\" Malagasy"               "Khassonke"                          
## [13] "Malinke"                             "Peulh / Fulfude"                    
## [15] "Soninke / Sarakolle"                 "Sonrhai"                            
## [17] "Chimang'anja"                        "Oshiwambo (Oshindonga/Oshikwanyama)"
## [19] "Beri beri"                           "Zarrma/Songhai"                     
## [21] "Kabye"
##  [1] "Moore"                               "Senoufo"                            
##  [3] "Baoule"                              "Bete"                               
##  [5] "Godie"                               "Guere"                              
##  [7] "Bangangte"                           "Foufoulde"                          
##  [9] "Mbede"                               "Myene"                              
## [11] "Nzebi/Metie"                         "Punu/Merie"                         
## [13] "Malgache << officiel >>"             "Malgache avec specificite regionale"
## [15] "Khassonke"                           "Malinke"                            
## [17] "Soninke/Sarakole"                    "Portuguese"                         
## [19] "Zarma/Songhai"

So there is some mis-match, but not much, across each list. Therefore, I change/fix the spelling of any languages that are obvious matches - i.e. those with just a one letter difference (which I corroborated to be abn alternative spelling online), or difference in accent mark (which LEDA() does not include in any spelling), etc. I do this to match the list of languages as they are spelled in the LEDA() function. Therefore, I fix the spelling as it is in my excel documents.

## [1] "Arabe"                                   
## [2] "Senufo/ Mianka"                          
## [3] "Soninke/ Sarakoll"                       
## [4] "Ijaw/Kalabari/Okirika/Andoni/Ogoni/Nembe"
## [1] "\"Official\" Malagasy" "Senufo"
## [1] "Senufo"

Individual Datasets

So at this point I have three datasets for each Afrobarometer round:

  • ab# - the Afrobarometer Dataset that contains the survey information.
  • lang.r# - the language number and corresponding language name from each Afrobarometer round
  • link.ab# - the Afrobarometer language name and corresponding EPR name for each round
##   COUNTRY Statename ccode       RESPNO age edu primary secondary tertiary
## 1       1     Benin   434 BEN0001-2008  38   4       1         0        0
## 2       1     Benin   434 BEN0002-2008  46   2       0         0        0
## 3       1     Benin   434 BEN0003-2008  28   4       1         0        0
## 4       1     Benin   434 BEN0004-2008  30   3       1         0        0
## 5       1     Benin   434 BEN0005-2008  23   4       1         0        0
## 6       1     Benin   434 BEN0006-2008  24   4       1         0        0
##   language year identity rural female employment employed democracy
## 1      100 2008        2     0      1          0        0         3
## 2      104 2008        4     0      0          1        0         3
## 3      101 2008       NA     0      1          2        1         3
## 4      100 2008        5     0      0          1        0         2
## 5      100 2008        5     0      1          1        0         3
## 6      100 2008        5     0      0          1        0         2
##   democracyInCountry satisfiedDemInCountry trustPresident trustParliament
## 1                  4                     4              3               1
## 2                  4                     4              1               1
## 3                  4                     2              3               2
## 4                  3                     2              1               1
## 5                  2                     2              3               3
## 6                  3                     3              1               2
##   trustRP treatedUnfairly
## 1       1               0
## 2       0               0
## 3       2              NA
## 4       0               0
## 5       1               0
## 6       2               1
## # A tibble: 6 x 2
##   language languageName
##      <dbl> <chr>       
## 1        1 English     
## 2        2 French      
## 3        3 Portuguese  
## 4        4 Kiswahili   
## 5      100 Fon         
## 6      101 Adja
##      a.cowcode a.iso3c a.group
## 39         434     BEN    Adja
## 166        434     BEN    Adja
## 1210       434     BEN  Bariba
## 1268       434     BEN   Dendi
## 1333       434     BEN     Fon
## 1630       434     BEN    Goun
##                                                                b.group
## 39                                                 Southwestern (Adja)
## 166                               Southeastern (Yoruba/Nagot and Goun)
## 1210 Northern (Bariba, Peul, Ottamari, Yoa-Lokpa, Dendi, Gourmanchema)
## 1268 Northern (Bariba, Peul, Ottamari, Yoa-Lokpa, Dendi, Gourmanchema)
## 1333                                               South/Central (Fon)
## 1630                              Southeastern (Yoruba/Nagot and Goun)
##             a.type b.type
## 39   Afrobarometer    EPR
## 166  Afrobarometer    EPR
## 1210 Afrobarometer    EPR
## 1268 Afrobarometer    EPR
## 1333 Afrobarometer    EPR
## 1630 Afrobarometer    EPR

Now I want to merge EPR with the link.ab# list.

Loading EPR

##   gwid     statename from   to             group groupid gwgroupid umbrella
## 1    2 United States 1946 1965            Whites    1000    201000       NA
## 2    2 United States 1946 1965           Latinos    2000    202000       NA
## 3    2 United States 1946 1965 African Americans    3000    203000       NA
## 4    2 United States 1946 1965   Asian Americans    4000    204000       NA
## 5    2 United States 1946 1965  American Indians    5000    205000       NA
## 6    2 United States 1946 1965    Arab Americans    6000    206000       NA
##     size        status reg_aut
## 1 0.6910      MONOPOLY        
## 2 0.1250    IRRELEVANT        
## 3 0.1240 DISCRIMINATED   false
## 4 0.0360    IRRELEVANT        
## 5 0.0078     POWERLESS    true
## 6 0.0042    IRRELEVANT
## Warning in countrycode(EPR$statename, "country.name", "cown"): Some values were not matched unambiguously: Serbia

So at this point, given how I am linking languages using LEDA() - some languages in Afrobarometer may correspond to multiple ethnic groups in the same country. In other words, 1 language may correspond to multiple ethnic groups, and each ethnic group may have different statuses. Therefore, I adopt the following coding rules.

In the picture below, you can see that the language “Adja” (from the Afrobarometer Wave 4 language repsonse) corresponds to two ethnic groups in Benin - groups in the Southwest and Southeast. The same applies for the language “Goun”. In this case - regardless of the language*ethnic group, their “status” does not change. In otherwords, all individuals who speak “Adja” (in the SW or SE) are “Junior Partners” and all individuals who speak “Goun” are “Junior Partners”. In such cases, I simply collapse the observations (or remove 1 from each, so that I don’t get duplicates when merging).

Example 1 - Excel Screencap

Example 1 - Excel Screencap

The next example gives us two other possibilities. In the first case (light green) the language “Akan” is linked to two ethnic groups in Ghana - the Asanta (Akan) and the “Other Akans”. In this case, the Asante are “Senior Partner” and Other Akans are “Junior Partner”. In such cases, I prioritze whether or not they have power (i.e., I don’t intend to differentiate between senior/junior partner). Therefore, I delete the “Other Akan” group.

Alternatively, the language “Ijaw/Kalabari/Okirika/Andoni/Ogoni/Nembe” (dark greeen) in Nigeria applies to two ethnic groups, the Ijaw and Ogoni; however, the Ijaw are “Junior Partner” and the Ogoni are “Powerless”. Therefore, I delete both as I am unable to differeniate them in the analysis.

Example 2 - Excel Screencap

Example 2 - Excel Screencap

By and large, the majority of countries had no duplicates. Of those that did have duplicates, it was only 1-2 groups. The exception was Namibia - in which nearly all languages coincided with multiple ethnic groups that ultimately had different statuses. I still followed the above rules.

Example 3 - Excel Screencap

Example 3 - Excel Screencap

So at this point, I’ve fully merged the Afrobarometer Wave 4 dataset with EPR, where individuals were connected to ethnic groups based upon their language, where dialect linked individuals to certain ethnic groups as defined in EPR. Let’s take a look at how many individuals now in Afrobarometer Wave 4 have corresponding information in EPR.

## [1] 27713
## [1] 8481
## [1] 19232
## [1] 0.6939703

Nearly 70% of respondents in Afrobarometer Wave 4 have a corresponding EPR group. The missing 30% likely exists, but I had to drop the information because languages corresponding to ethnic groups with conflicting statuses. I do hope to return to this in the future - but for now I move forward as a proof of concept.

At this point, I want to repeat the above steps (beginning with loading EPR) two more times - one for each year of Afrobarometer Wave 5 (2011) and Afrobarometer Wave 6 (2015).

Repeat for AB5 and AB6

I repeat the above steps (beginning with “Loading EPR”) for AB5 and AB6, but I do not replicate it below.

As with above, there are similar problems where 1 language coincides with multiple ethnic groups. This is particularly true in North Africa, as shown below.

In the case of Morocco, Arabic coincides with two ethnic groups - Arabs and Saharwis. Arabs are dominant, and Sahrawis are discrimianted. However, Arabic-speaking Saharwis only make up .016 of the population.See similar issues Arabic being the sole-language in Sudan and Egypt. Example 4 - Excel Screencap

Therefore, the only change i really make is that if there are multiple ethinc groups to a single language, and one of the ethinc groups is less than .1 percent of the population (often much smaller) and is powerless, I delete that group in favor of the ethnic group that is much larger and has power. This is to better capture scenarios where very small marginalized ethnic groups (who likely are not even picked up by afrobarometer surveys) speak the same language as larger empowered groups. Therefore, in the above, I delet the information from row 277 and 278.

Vertical Merge

At this point in time, I have the Afrobarometer Wave 4-6 surveys merged with EPR. For roughly 70% of all respondents, I have their corresponding Ethnic Group Status information (which is not included in Afrobarometer).

##   COUNTRY Statename ccode       RESPNO age edu primary secondary tertiary
## 1       1     Benin   434 BEN0001-2008  38   4       1         0        0
## 2       1     Benin   434 BEN0002-2008  46   2       0         0        0
## 3       1     Benin   434 BEN0003-2008  28   4       1         0        0
## 4       1     Benin   434 BEN0004-2008  30   3       1         0        0
## 5       1     Benin   434 BEN0005-2008  23   4       1         0        0
## 6       1     Benin   434 BEN0006-2008  24   4       1         0        0
##   language year identity rural female employment employed democracy
## 1      100 2008        2     0      1          0        0         3
## 2      104 2008        4     0      0          1        0         3
## 3      101 2008       NA     0      1          2        1         3
## 4      100 2008        5     0      0          1        0         2
## 5      100 2008        5     0      1          1        0         3
## 6      100 2008        5     0      0          1        0         2
##   democracyInCountry satisfiedDemInCountry trustPresident trustParliament
## 1                  4                     4              3               1
## 2                  4                     4              1               1
## 3                  4                     2              3               2
## 4                  3                     2              1               1
## 5                  2                     2              3               3
## 6                  3                     3              1               2
##   trustRP treatedUnfairly languageName                                group
## 1       1               0          Fon                  South/Central (Fon)
## 2       0               0       Yoruba Southeastern (Yoruba/Nagot and Goun)
## 3       2              NA         Adja                  Southwestern (Adja)
## 4       0               0          Fon                  South/Central (Fon)
## 5       1               0          Fon                  South/Central (Fon)
## 6       2               1          Fon                  South/Central (Fon)
##    size         status
## 1 0.330 JUNIOR PARTNER
## 2 0.185 JUNIOR PARTNER
## 3 0.150 JUNIOR PARTNER
## 4 0.330 JUNIOR PARTNER
## 5 0.330 JUNIOR PARTNER
## 6 0.330 JUNIOR PARTNER

To note, the ``ab_all_final.Rda’’ dataset is the individual respondent information from Afrobarometer that can be used to tackle Question 1.

Please see the ``clott_egd_q1_rmark.Rmd’’ file for a preliminary data analysis for Question 1.

Backtracking Age Profiles

Let’s try to create an aggregate now. So “aball” contains three waves of Afrobarometer:

  • Afrobarometer Wave 4 - Conducted in 2008
  • Afrobarometer Wave 5 - Conducted from 2011-2013
  • Afrboarometer Wave 6 - Conducted from 2014-2015

Therefore, I have a survey in which respondents were surveyed at different times, therefore their ages are not standardized - which means I need to fix this problem before backtracking age-cohort profiles. Therefore, I increase everyone’s age dependent upon when the survey was conducted. For instance, a 35 year old who was surveyed in 2008 would now (presumably, if alive) be 47. Of course this creates issues if someone was already elderly in 2008 (say ages 85+); however, we can keep them in the survey since we are assuming all education would be attained at a younger age. Therefore, keeping their observations helps bolster our estimates of earlier years. Furhtermore, I’m only standardizing from 2015 - the last year we have information.

From this information, my preliminary attempt at getting educational attainment rates per country group is to do the following:

  • Create a for loop that each iteration aggregates the average educational attainment per ethnic group per year, but then limit the respondent sample by age as I backtrack through time.
  • Each loop though removes respondents based on their age, with each loop corresponding to one year.
  • For instance, the first loop takes the average educational attainment of each ethnic group - using the educational attainment of all respondents (ages 18+). This gives us the educational attainment for 2015.
  • The second loop takes the average educational attainmetn of each ethnic group - using the educational attainment of all respondents ages 19+. This gives us the educational attainment for 2014.
  • The third loop takes the average educational attainment of each ethnic group - using the educational attainment of all respondents 20+. This gives us the educational attainmetn for 2013.
  • Repeat until I have information through 1969.

This method of backtracking group-year information is based upon the assumption that education (at least primary and secondary) will be completed int he first 18 years of respondents’ life, on average. Tertiary education will still be captured by individuals older than 18.

##   Statename                                group     2015     2014     2013
## 1   Algeria                                Arabs 3.195989 3.195989 3.195989
## 2   Algeria                              Berbers 2.436170 2.436170 2.436170
## 3     Benin                  South/Central (Fon) 2.376828 2.376828 2.376828
## 4     Benin Southeastern (Yoruba/Nagot and Goun) 2.083770 2.083770 2.083770
## 5     Benin                  Southwestern (Adja) 1.954301 1.954301 1.954301
## 6  Botswana                                Birwa 3.659574 3.659574 3.659574
##       2012     2011     2010     2009     2008     2007     2006     2005
## 1 3.182331 3.152953 3.113568 3.066116 3.030558 2.940331 2.856975 2.782822
## 2 2.436170 2.423913 2.423913 2.423913 2.247191 2.247191 2.204545 2.183908
## 3 2.376828 2.376828 2.360953 2.327945 2.330579 2.308523 2.265997 2.240260
## 4 2.083770 2.083770 2.066667 2.060109 2.033241 1.974212 1.917647 1.899696
## 5 1.954301 1.954301 1.929155 1.874302 1.882857 1.849275 1.781818 1.738854
## 6 3.659574 3.652174 3.613636 3.613636 3.613636 3.525000 3.512821 3.432432
##       2004     2003     2002     2001     2000     1999     1998     1997
## 1 2.754140 2.669355 2.609244 2.533432 2.472136 2.398374 2.352349 2.309187
## 2 2.116279 2.107143 2.096386 2.086420 2.075000 2.052632 1.945205 1.944444
## 3 2.214570 2.212587 2.173601 2.125926 2.057508 2.046053 2.038801 2.014679
## 4 1.880878 1.853821 1.783505 1.768421 1.728302 1.675676 1.690083 1.672489
## 5 1.720257 1.732203 1.696864 1.667857 1.661479 1.614458 1.534783 1.490909
## 6 3.388889 3.264706 3.187500 3.187500 2.931034 2.892857 2.892857 2.892857
##       1996     1995     1994     1993     1992     1991     1990     1989
## 1 2.225989 2.165029 2.126016 2.071429 2.053879 1.986333 1.940758 1.888614
## 2 1.943662 1.942857 1.850746 1.815385 1.825397 1.750000 1.750000 1.678571
## 3 2.009615 1.995918 1.983299 2.015982 1.959716 1.936275 1.964770 1.954155
## 4 1.656109 1.668246 1.691176 1.666667 1.620112 1.609195 1.607595 1.624161
## 5 1.421569 1.434555 1.430108 1.471264 1.429412 1.438272 1.430464 1.453237
## 6 2.892857 2.814815 2.730769 2.680000 2.434783 2.434783 2.285714 2.200000
##       1988     1987     1986     1985     1984     1983     1982     1981
## 1 1.864583 1.801105 1.747748 1.665605 1.651316 1.609589 1.521898 1.461847
## 2 1.588235 1.460000 1.372093 1.309524 1.292683 1.275000 1.135135 1.085714
## 3 1.938080 1.862745 1.864769 1.869231 1.888889 1.944206 1.909502 1.848341
## 4 1.657143 1.595420 1.507937 1.521368 1.531532 1.552381 1.572816 1.530612
## 5 1.477273 1.472000 1.394958 1.446429 1.452830 1.473684 1.373626 1.261905
## 6 1.823529 1.937500 1.937500 1.937500 1.733333 1.733333 1.733333 1.642857
##       1980     1979     1978      1977      1976      1975      1974      1973
## 1 1.365957 1.337719 1.273973 1.2537313 1.1516854 1.0952381 1.0931677 1.0764331
## 2 1.117647 1.090909 1.062500 0.8965517 0.6666667 0.5652174 0.5238095 0.5500000
## 3 1.855615 1.821229 1.844720 1.7054795 1.6250000 1.7142857 1.6886792 1.6739130
## 4 1.443038 1.361111 1.347826 1.3333333 1.2181818 1.1250000 0.8571429 0.8055556
## 5 1.209877 1.213333 1.349206 1.3500000 1.3508772 1.3653846 1.3600000 1.3478261
## 6 1.642857 1.769231 1.666667 1.6666667 1.3636364 1.3636364 1.3636364 1.3000000
##        1972      1971      1970      1969      1968
## 1 1.0206897 0.9777778 0.9047619 0.8403361 0.7619048
## 2 0.3888889 0.2941176 0.2500000 0.2142857 0.2307692
## 3 1.5662651 1.5500000 1.6086957 1.5606061 1.4918033
## 4 0.7352941 0.7575758 0.7419355 0.7419355 0.7666667
## 5 1.3333333 1.2857143 1.0810811 1.0882353 1.2692308
## 6 1.4444444 1.2500000 1.2500000 1.2500000 1.2500000

And voila! We have a dataset that has ethnic group education level per country year. Let’s melt it and add in the EPR information again. The trick is that the EPR information (whether or not a gruop was discrimianted/powerless/inpower etc. changes historically. So we actually want to merge this new dataset with the old EPR dataset. Therefore, we can say that “X Group was Discrimianted in 1975, and their education level was Y.”

Then we can look at a couple figures of what we have.

##   statename                                group year      Edu
## 1   Algeria                                Arabs 2015 3.195989
## 2   Algeria                              Berbers 2015 2.436170
## 3     Benin                  South/Central (Fon) 2015 2.376828
## 4     Benin Southeastern (Yoruba/Nagot and Goun) 2015 2.083770
## 5     Benin                  Southwestern (Adja) 2015 1.954301
## 6  Botswana                                Birwa 2015 3.659574

Preliminary Figures & Analyses

As of now, I now have a preliminary dataset that contains ethnic group educational attainment rates from 1969-2015 in the following countries:

##  [1] "Algeria"       "Benin"         "Botswana"      "Burkina Faso" 
##  [5] "Burundi"       "Cameroon"      "Cote d’Ivoire" "Egypt"        
##  [9] "Ghana"         "Guinea"        "Kenya"         "Lesotho"      
## [13] "Liberia"       "Madagascar"    "Malawi"        "Mali"         
## [17] "Mauritius"     "Morocco"       "Mozambique"    "Namibia"      
## [21] "Niger"         "Nigeria"       "Senegal"       "Sierra Leone" 
## [25] "South Africa"  "Swaziland"     "Tanzania"      "Tunisia"      
## [29] "Uganda"        "Zambia"        "Zimbabwe"

For each year, it also includes the EPR ethinc group status information (whether or not the group was a monopoly, dominant, senior partner, junior partner, powerless, discriminated, or irrelevant). I’ve coded that to be “in power” or “not in power”, as defined by EPR. Groups that are “monopoly, dominant, senior partner, or junior partner” are considered to be in power. With this information, I can make figures for each country such as these (where each line is an ethnic group over time):

The creation of this dataset simultaneously supports two of my papers for my dissertation.

1. Education & Democratization/Autocratic Failure

The original purpose of this dataset was to be able to ultimately do a cross-country analysis of ethnic group educational attainment on likelihood of democratization. Whereas past analyses primarily focus on:

  • educational attainment per country year –> democratization

I will instead be able to:

  • educational attainment of ethnic group * ethnic group status –> democratization

Where my argument is that as the educational attainment of marginalized groups increases, so does the likelihood of democratization. EPR also includes extensive information on group size and other factors - so I could do a more extensive analysis.

2. Education, Ethnic Group Identity, and Political Attitudes

A core assumption of my theory is that education uniquely impacts marginalized communities as opposed to advantaged communities. I argue that education will foster comparatively stronger pro-democratic attitudes among marginalized groups as democracy is a means to inclusion. Alternatively, education will foster state-support (pro-authoritarian) views among advantaged groups to protect the status-quo.

I have another working paper that looks the individual level survey data from Afrobarometer to predict when individuals are more likely to identify with the state ethnic group. I argue that education will lead to marginalized individuals to be more likely to identify with their ethnic group, and that education will lead to advantaged individuals to be more likely to identify with their state.

Now that I have the merged Afrobarometer data that also includes EPR information, i can provide a hierarchical analysis of political attitudes dependent upon individuals’ membership in excluded/included ethnic groups.

##   COUNTRY Statename ccode       RESPNO age edu primary secondary tertiary
## 1       1     Benin   434 BEN0001-2008  38   4       1         0        0
## 2       1     Benin   434 BEN0002-2008  46   2       0         0        0
## 3       1     Benin   434 BEN0003-2008  28   4       1         0        0
## 4       1     Benin   434 BEN0004-2008  30   3       1         0        0
## 5       1     Benin   434 BEN0005-2008  23   4       1         0        0
## 6       1     Benin   434 BEN0006-2008  24   4       1         0        0
##   language year identity rural female employment employed democracy
## 1      100 2008        2     0      1          0        0         3
## 2      104 2008        4     0      0          1        0         3
## 3      101 2008       NA     0      1          2        1         3
## 4      100 2008        5     0      0          1        0         2
## 5      100 2008        5     0      1          1        0         3
## 6      100 2008        5     0      0          1        0         2
##   democracyInCountry satisfiedDemInCountry trustPresident trustParliament
## 1                  4                     4              3               1
## 2                  4                     4              1               1
## 3                  4                     2              3               2
## 4                  3                     2              1               1
## 5                  2                     2              3               3
## 6                  3                     3              1               2
##   trustRP treatedUnfairly languageName                                group
## 1       1               0          Fon                  South/Central (Fon)
## 2       0               0       Yoruba Southeastern (Yoruba/Nagot and Goun)
## 3       2              NA         Adja                  Southwestern (Adja)
## 4       0               0          Fon                  South/Central (Fon)
## 5       1               0          Fon                  South/Central (Fon)
## 6       2               1          Fon                  South/Central (Fon)
##    size         status timeDiff ageUpdate
## 1 0.330 JUNIOR PARTNER        7        45
## 2 0.185 JUNIOR PARTNER        7        53
## 3 0.150 JUNIOR PARTNER        7        35
## 4 0.330 JUNIOR PARTNER        7        37
## 5 0.330 JUNIOR PARTNER        7        30
## 6 0.330 JUNIOR PARTNER        7        31

Looking Ahead

There are a handful of obstacles yet to overcome:

  • Across all respondents in the Afrobarometer surveys 4-6, only 70% are currently matched with EPR. This is likely due to my having to drop instances where 1 language matches 2+ incompatible ethnic groups.
  • This is only information on Africa (and 35 countries therein). It would be ideal to envision a way to do this globally, potentially with information that could apply cross-country. I have explored doing so with information like World-Values Survey or the Demographic and Health Surveys; however, I would then need to find a different way to link individuals’ ethnic gropus with a dataset like EPR.
  • I don’t currently have confidence estimations for each estimate. It would be good to probably provide upper- and lower-bound estimates for each education estimate, dependent upon the number of respnodents that are aggregated in that estimation. Alternatively, I am also considering multi-level regression with poststratification where I can combine information from multiple datasets to create stronger estimates.